Here is a quick summary of our data. We see how many times values occured in our data frame as well as what were the quartiles values for our data.
summary(sales_final)
## sku customer_name ship_city ship_state
## FNG00015:10 CUSTOMER 1:58 TLAJOMULCO DE ZUNIGA:12 MX :26
## FNG00014: 8 CUSTOMER 3: 9 CUAUTITLAN IZCALLI :10 TB :18
## FNG00030: 7 CUSTOMER 6: 9 VILLAHERMOSA :10 JA :15
## FNG00031: 7 CUSTOMER 2: 2 CHALCO : 8 NL :14
## FNG00033: 7 CUSTOMER 4: 2 CULIACAN : 8 SI : 8
## FNG00043: 7 CUSTOMER 5: 2 MONTERREY : 8 CH : 1
## (Other) :37 (Other) : 1 (Other) :27 (Other): 1
## zip_code shipping_method package_24_shipment cases_sold
## 86280 :12 Delivery:83 Min. : 175 Min. : 252
## 45679 : 8 1st Qu.: 8898 1st Qu.: 12264
## 56640 : 8 Median : 20664 Median : 24840
## 66550 : 8 Mean : 58039 Mean : 47306
## 80130 : 8 3rd Qu.: 73386 3rd Qu.: 62118
## 54769 : 6 Max. :326400 Max. :213612
## (Other):33
## avg_spend
## Min. : 1.566
## 1st Qu.: 7.142
## Median :12.391
## Mean :10.950
## 3rd Qu.:15.707
## Max. :16.975
##
Now lets take a much closer look and see in which cities customers are buying these water cases.
ggplot(sales_final, aes(x = customer_name, fill = ship_city))+
geom_bar(color = 'black')+
labs(x = 'Customer', y= 'Where that customer is buying from', fill = 'City')
JA
## [1] "LAJOMULCO DE ZUNIGA" "EL SALTO"
MX
## [1] "TULTITLAN" "CHALCO" "CUAUTITLAN IZCALLI"
## [4] "TEOLOYUCAN" "RAMOS ARIZPE"
NL
## [1] "GUADALUPE" "MONTERREY" "ESCOBEDO"
## [4] "CIENEGA DE FLORES"
SI
## [1] "CULIACAN"
TB
## [1] "VILLAHERMOSA TABASCO" "VILLAHERMOSA"
Here we show which city had the most sold cases. What was the breakdown of the shipments to each city. We can see which cities are the ones who are buying the most cases and which ones are buying the average amount of around 47306 cases. We can make cost efficient decisions in the next quarter so to maximize our orders with those cities that are buying the most cases.
print(cities)
## # A tibble: 16 x 2
## ship_city total_sold
## <fct> <int>
## 1 CHALCO 410328
## 2 CHIHUAHUA 12096
## 3 CIENEGA DE FLORES 4032
## 4 CUAUTITLAN IZCALLI 930676
## 5 CULIACAN 541140
## 6 EL SALTO 12628
## 7 ESCOBEDO 197821
## 8 GOMEZ PALACIO 4032
## 9 GUADALUPE 60192
## 10 MONTERREY 670048
## 11 RAMOS ARIZPE 616
## 12 TEOLOYUCAN 76552
## 13 TLAJOMULCO DE ZUNIGA 501454
## 14 TULTITLAN 37392
## 15 VILLAHERMOSA 177761
## 16 VILLAHERMOSA TABASCO 289657
pie
Here we see the frequency of cases were bought by a specific city. We can see which cities are buying larger orders at a time or who is buying lower quantities.
ggplot(sales_final, aes(x = cases_sold, fill = ship_city))+
geom_histogram(color = 'black')+
labs(x = 'Cases Sold', fill = "City")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here we see that I divided up the information and plotted it on a map of Mexico the busiest place that the cases were shipped for Q1 of 2017. This is good to visualize where we can focus more of the shipments in the future as to be sure to maximize shipments and profits.
ggplot(mexico, aes(x = long, y = lat)) +
geom_polygon(aes(group = group), fill = "white",
color = "gray40", size = .2) +
geom_point(data = sales_final2, aes(x = long, y = lat, size = cases_sold, color = ship_city))+
scale_size_continuous(range = c(8,15))+
labs(color = 'City of Destination', size = "Amount of Cases Sold", title ='Comparing the volume of shipment to each city in Mexico')
Here we have a boxplot that shows us the median price of shipping across all the cities in Q1. We see what the maximum value and minimum value were for shipping to that respective city. The average cost of shipment was $10.95.
ggplot(sales_final, aes(x = reorder(ship_city, avg_spend, median), y = avg_spend))+
geom_boxplot()+
theme_bw()+
theme(axis.text.x=element_text(angle=45,hjust=1))+
labs(x = "City", y = "Shipment Cost per Case(in Dollars)")
Here we are showing the average shipping price of shipping cases of water against how many cases where sold at that price point. We then want to see how many complete 24 packs of water of those cases sold compare against to the price of the shipment. Since different products have different bottle sizes and pack sizes, so “24-pack equivalent” cases are used as a scale for a comparison between different product types.
ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
geom_point(aes(size = package_24_shipment), alpha = 1/3)+
labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')
Now we want to smooth out this data to see what sort of relationships each state state has with regards to the cases sold, shipping cost, and 24 pack equivalents. We can determine what relationship our data has form this information. As cases start to get larger we see which states comsume the most of our product, while others start to fall off or comsume less. We can also see at what amount states start to increase their consumptions. This informations is valuable as it would lead to know how many cases to estimate to each state which can result in having a more efficient second quarter.
ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
geom_point(aes(size = package_24_shipment), alpha = 1/3)+
geom_smooth(aes(color = ship_state), se = FALSE) +
labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')
## `geom_smooth()` using method = 'loess'
With the limited data that we have of cases sold in the Q1 we will create a simple cases sold forecast to see what we can expect in the next quarter as far as cases sold go. Here we will provide insight into the values that were used to make the best fitted stastical prediciton.
Since we are working with predictions we need to have a fitted alpha and beta that will fit our data by having the lowest sum of squared errors of prediction so that we know our data is fitted to the best values and will yield more accurate results.
alpha
## [1] 0.1727234
beta
## [1] 0.1482043
This is what the predicted percent we can see in the growth of cases sold in Q2. We can see the predicted percentage growth over 30, 60, 90 days into the future.
growth_over_30
## [1] "4.62%"
growth_over_60
## [1] "9.4%"
growth_over_90
## [1] "14.2%"
Finally here is a graphical representation of out cases sold forecasting in Q2. We have out blue line as out predicted forecast and our upper and lower bounds with the red lines. When we have the actual data we can see how close the predicted data was to the actual data and calculate the residual of the values.
plot.ts(DT$V1, main = "Cases Sold Predictions for Q2", xlab = "Days", ylab = 'Cases Sold', xlim = c(10,185), ylim = c(0, 200000))
lines(cases_sold.mean$fitted[,1], col = 'green') #the fitted is for all the fittted values, the 1 is for the first column
lines(cases_sold.pred[,1], col='blue') # the fit columns
lines(cases_sold.pred[,2], col = 'red') #upper column
lines(cases_sold.pred[,3], col = 'red') #lower columns
With this data we can now make accurate estimations for quarter 2 of sales and freight. We can estimate how much it would cost to ship to these same locations in quarter 2 as well as know about how much each respective state and city will consume in the future. The data also tells us what the perfect amount of cases would be for each state as to now have them start to consume less product; therefore, making our shipment process much more efficient as we will know how much a specific state/city will expect to consume.